Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources

نویسندگان

Yulia Tsvetkov

Shuly Wintner

چکیده

We propose a framework for using multiple sources of linguistic information in the task of identifying multiword expressions in natural language texts. We define various linguistically motivated classification features and introduce novel ways for computing them. We then manually define interrelationships among the features, and express them in a Bayesian network. The result is a powerful classifier that can identify multiword expressions of various types and multiple syntactic constructions in text corpora. Our methodology is unsupervised and language-independent; it requires relatively few language resources and is thus suitable for a large number of languages. We report results on English, French, and Hebrew, and demonstrate a significant improvement in identification accuracy, compared with less sophisticated baselines.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantically Motivated Hebrew Verb-Noun Multi-Word Expressions Identification

Identification of Multi-Word Expressions (MWEs) lies at the heart of many natural language processing applications. In this research, we deal with a particular type of Hebrew MWEs, VerbNoun MWEs (VN-MWEs), which combine a verb and a noun with or without other words. Most prior work on MWEs classification focused on linguistic and statistical information. In this paper, we claim that it is essen...

متن کامل

Lexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities

This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...

متن کامل

Multiple attribute group decision making with linguistic variables and complete unknown weight information

Interval type-2 fuzzy sets, each of which is characterized by the footprint of uncertainty, are a very useful means to depict the linguistic information in the process of decision making. In this article, we investigate the group decision making problems in which all the linguistic information provided by the decision makers is expressed as interval type-2 fuzzy decision matrices where each of ...

متن کامل

Automatic Identification Of Non-Compositional Multi-Word Expressions Using Latent Semantic Analysis

Making use of latent semantic analysis, we explore the hypothesis that local linguistic context can serve to identify multi-word expressions that have noncompositional meanings. We propose that vector-similarity between distribution vectors associated with an MWE as a whole and those associated with its constitutent parts can serve as a good measure of the degree to which the MWE is composition...

متن کامل

Automatic Identification of Bengali Noun-Noun Compounds Using Random Forest

This paper presents a supervised machine learning approach that uses a machine learning algorithm called Random Forest for recognition of Bengali noun-noun compounds as multiword expression (MWE) from Bengali corpus. Our proposed approach to MWE recognition has two steps: (1) extraction of candidate multi-word expressions using Chunk information and various heuristic rules and (2) training the ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources

نویسندگان

چکیده

منابع مشابه

Semantically Motivated Hebrew Verb-Noun Multi-Word Expressions Identification

Lexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities

Multiple attribute group decision making with linguistic variables and complete unknown weight information

Automatic Identification Of Non-Compositional Multi-Word Expressions Using Latent Semantic Analysis

Automatic Identification of Bengali Noun-Noun Compounds Using Random Forest

عنوان ژورنال:

اشتراک گذاری